Information and WAIC

Elizabeth King
Kevin Middleton

Frequentist model comparisons

  • Likelihood ratio tests
  • AIC or AICc
    • Akaike model weights
  • Cross validation

Information criteria and WAIC

  • Review Quantitative Methods lecture 10-3

Kullback-Leibler Divergence

No model represents the true process that generated the outcomes.

Information is the “distance” between a prospective models and the true model

  • Not strictly speaking the distance

\[D_{KL}(M_{true} | M_1) \neq D_{KL}(M_1 | M_{true})\]

Kullback-Leibler Divergence

  • Information lost when trying to approximate the true model
  • Amount of “surprise” when a model predicts new data

KL information doesn’t aid directly in model evaluation (what is the true model?)

  • i.e., a likelihood without anything to compare to

KL Divergence to elpd

  • In the real world we don’t know \(M_{true}\)
  • For each model estimate the expected log-predictive density (elpd)

\[E = \log[p_{M_1}(y)]\]

elpd is estimated by information criteria or leave-one-out cross-validation (next lecture)

WAIC

  • “Widely Applicable Information Criterion” Sumio Watanabe (2010)
  • Parallels to information criteria and AIC
    • Almost equivalent interpretation as long as the assumptions are met
  • Works well large data sets
    • Fast
    • PSIS-LOO-CV preferred for smaller datasets

Energy expenditure in naked mole rats

Fit different models to these data

  1. Mean: grand mean (no body mass)
  2. ANOVA: group means only (no body mass)
  3. OLS regression: body mass only, no grouping
  4. ANCOVA: intercepts varying
  5. ANCOVA: slopes varying and intercepts varying

1: Mean

fm1 <- lm(Energy ~ 1, data = M)

2: ANOVA

fm2 <- lm(Energy ~ Caste, data = M)

3: OLS regression

fm3 <- lm(Energy ~ Mass, data = M)

4: ANCOVA, intercepts varying

fm4 <- lm(Energy ~ Mass + Caste, data = M)

5: ANCOVA, intercepts and slopes vary

fm5 <- lm(Energy ~ Mass * Caste, data = M)

A family of models

options(brms.backend = "cmdstanr",
        mc.cores = 4)

fm1 <- brm(Energy ~ 1, data = M,
           prior = prior(normal(0, 3), class = Intercept), iter = 2e4, refresh = 0)
fm2 <- brm(Energy ~ Caste, data = M,
           prior = prior(normal(0, 3), class = b), iter = 2e4, refresh = 0)
fm3 <- brm(Energy ~ Mass, data = M,
           prior = prior(normal(0, 3), class = b), iter = 2e4, refresh = 0)
fm4 <- brm(Energy ~ Mass + Caste, data = M,
           prior = prior(normal(0, 3), class = b), iter = 2e4, refresh = 0)
fm5 <- brm(Energy ~ Mass * Caste, data = M,
           prior = prior(normal(0, 3), class = b), iter = 2e4, refresh = 0)

WAIC

waic(fm1)

Computed from 40000 by 35 log-likelihood matrix

          Estimate  SE
elpd_waic    -16.7 3.8
p_waic         1.7 0.5
waic          33.4 7.7
waic(fm2)

Computed from 40000 by 35 log-likelihood matrix

          Estimate  SE
elpd_waic    -17.6 4.2
p_waic         2.7 0.7
waic          35.2 8.4

2 (5.7%) p_waic estimates greater than 0.4. We recommend trying loo instead. 

Comparing WAIC

waic1 <- waic(fm1)
waic2 <- waic(fm2)
waic3 <- waic(fm3)
waic4 <- waic(fm4)
waic5 <- waic(fm5)

loo_compare(waic1, waic2, waic3, waic4, waic5)
    elpd_diff se_diff
fm4  0.0       0.0   
fm5 -0.1       0.6   
fm3 -2.5       2.4   
fm1 -7.0       3.9   
fm2 -7.9       3.9   

WAIC Model weights

  • Model weights redistribute the relative strength of WAIC from 0 to 1
  • Sum of all model weights is 1
model_weights(fm1, fm2, fm3, fm4, fm5, weights = "waic") |> 
  as.data.frame() |> 
  knitr::kable(col.names = "Weight", digits = 4)

WAIC Model weights

Weight
fm1 0.0005
fm2 0.0002
fm3 0.0387
fm4 0.4942
fm5 0.4664

References